Verifying pronunciation dictionaries using conflict analysis

نویسندگان

  • Marelie H. Davel
  • Febe de Wet
چکیده

We describe a new language-independent technique for automatically identifying errors in an electronic pronunciation dictionary by analyzing the source of conflicting patterns directly. We evaluate the effectiveness of the technique in two ways: we perform a controlled experiment using artificially corrupted data (allowing us to measure precision and recall exactly); and then apply the technique to a real-world pronunciation dictionary, demonstrating its effectiveness in practice. We also introduce a new freely available pronunciation resource (the RCRL Afrikaans Pronunciation Dictionary), the largest such dictionary that currently exists.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Error analysis of a public domain pronunciation dictionary

We explore pattern recognition techniques for verifying the correctness of a pronunciation lexicon, focusing on techniques that require limited human interaction. We evaluate the British English Example Pronunciation (BEEP) dictionary [1], a popular public domain resource that is widely used in English speech processing systems. The techniques being investigated are applied to the lexicon and t...

متن کامل

The efficient generation of pronunciation dictionaries: human factors during bootstrapping

Bootstrapping techniques have significant potential for the efficient generation of linguistic resources such as electronic pronunciation dictionaries. We describe a system and an approach to bootstrapping for the development of such dictionaries, and report on experiments conducted to investigate the efficiency and effectiveness of the system, focusing on the human factors that influence the p...

متن کامل

Learning Pronunciation Rules for English Graphemes Using the Version Space Algorithm

We describe a technique for learning pronunciation rules based on the Version Space algorithm. In particular, we describe how to learn pronunciation rules for a representative subset of the English graphemes. We present a learning procedure called LEP-G.1 (learning to pronounce English graphemes) that learns English pronunciation rules from examples in the form of word-pronunciation pairs. With...

متن کامل

Automatic Generation of Pronunciation Dictionaries

In this report we will describe a data driven approach for creating pronunciation dictionaries for a new unseen target language by voting among phoneme recognizers in nine different languages other than the target language. In this process recordings of the new language that are transcribed on word level are decoded by the phoneme recognizers. This results in a hypothesis of nine phonemes per t...

متن کامل

Flexible Parameter Tying for Conversational Speech Recognition

Modeling pronunciation variation is key for recognizing conversational speech. Previous efforts on pronunciation modeling by modifying dictionaries only yielded marginal improvement. Due to complex interaction between dictionaries and acoustic models, we believe a pronunciation modeling scheme is plausible only when closely coupled with the underlying acoustic model. This paper explores the use...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010